Applied Modelling in Drug Development

ROeS 2025, Graz

Sebastian Weber

Novartis Pharma AG

Lukas Widmer

September 14, 2025

Preface

Disclaimer

The opinions & statements expressed in this presentation and on the following slides are solely of the presenters, and not necessarily those of Novartis.

Learning Goals

After this course, you should:

  • Be familiar with brms syntax and workflow
  • Recognize its versatility for statistical modelling in drug development
  • Have hands-on experience with the package from two guided exercises

and of course:

  • Feel empowered to use brms going forward!

Housekeeping

Acknowledgements

  • Paul Buerkner (Technical University of Dortmund)
  • Andrew Bean (Novartis)
  • Björn Holzhauer (Novartis)
  • David Ohlssen (Novartis)
  • Cong Zhang (Novartis)

A brms modelling workflow

Bayesian inference

p(\theta|y) = \frac{p(y|\theta) \, p(\theta)}{p(y)}

  • Prior p(\theta)
  • Model likelihood p(y|\theta)
  • Marginal data likelihood p(y) = \int p(y|\theta) \, p(\theta) \, d\theta
  • Posterior p(\theta|y) is conditional on data y

Bayesian inference is integration

\hat{\theta} = E[\theta|y] = \int \theta \, p(\theta|y) \, d\theta

In practice solved by Markov-Chain Monte-Carlo (MCMC) in generating draws s=1, \ldots, S

\theta_s \sim p(\theta|y)

\int \theta \, p(\theta|y) \, d\theta \approx \frac{1}{S} \sum_{s=1}^S \theta_s

Stan project: Bayesian modelling platform

  • mc-stan.org
  • discourse.mc-stan.org
  • Models are written in the Stan probabilistic programming language for priors & model likelihood conditional on data
  • Hamiltonian Monte-Carlo No-U-turn sampler (NUTS)

data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}

brms: Bayesian regression modelling in Stan

https://paulbuerkner.com/brms/

  • Specify models via extended R formula syntax
  • Internally write Stan code that is readable yet fast
  • Provide an easy interface for defining priors & models
  • Facilitate post-processing

Some Highlights of brms

  • Flexible hierarchical (random effects) modeling
  • Both built-in and user-defined likelihoods
  • Explicit and implicit non-linear modeling
  • Distributional regression
  • Within-chain parallelization
  • Posterior and prior predictions
  • Model checking
  • Highly dense feature matrix

Model specification via formula

Varying intercept model for a single grouping factor:

formula <- y ~ 1 + x + (1 | g)

Varying intercept-slope model for a single grouping factor:

formula <- y ~ 1 + x + (1 + x | g)

Advanced non-linear terms such as Gaussian processes:

formula <- y ~ 1 + gp(x) + (1 + x | g)

… or approximate Gaussian processes:

formula <- y ~ 1 + gp(x, k=9) + (1 + x | g)

Model specification via formula

Linear formulas for multiple distributional parameters (e.g., predict mean and overdispersion of negative binomial):

formula <- bf(
  y ~ 1 + x + (1 | g) + ...,
  par2 ~ 1 + x + (1 | g) + ...,
  par3 ~ 1 + x + (1 | g) + ...,
)

Non-linear formula for a single distributional parameter:

formula <- bf(
  y ~ fun(x, nlpar1, nlpar2),
  nlpar1 ~ 1 + x + (1 | g) + ...,
  nlpar2 ~ 1 + (1 | g) + ...,
  nl = TRUE
)

Model likelihood via family

General structure:

family <- brmsfamily(
    family = "<family>", link = "<link>",
    more_link_arguments
)

Gaussian likelihood (default):

family <- brmsfamily(family = "gaussian", link = "identity",
                     link_sigma = "log")

Poisson likelihood:

family <- brmsfamily(family = "poisson", link = "log")

See also vignette("brms_families") for details on the families.

Historical control data

Use of historical control data in clinical trials

Goal: reduce control group sample size while maintaining power

  1. Collect historical data from relevant literature systematically

  2. Evaluate heterogeniety of historical data
    data quality, patient population, trial design

  3. Pre-specify trial protocol
    prior documentation, operating charachteristics

  4. Prior-data conflict?

The meta-analytic-predictive (MAP) prior1

  • Borrow from historical data under an exchangeability assumption
  • Discount/down-weight historical data through between-trial heterogeneity \tau
  • Meta-analytic-predictive (MAP) prior through generative nature of hierarchical models
  • Pre-requisites
    • essentially the same endpoint
    • similar treatment procedures
    • similar patient eligibility criteria
    • similar patient population characteristics
flowchart LR
    M((β,τ)) --> A
    M --> B
    M --> C
    M -->|prediction| T
    A((θ<sub>1</sub>)) --> YA[Y<sub>1</sub>]
    B((θ<sub>2</sub>)) --> YB[Y<sub>2</sub>]
    C((θ<sub>3</sub>)) --> YC[Y<sub>3</sub>]
    T((θ<sub>*</sub>)) -->|new trial| YT[Y<sub>*</sub>]

Example: double-blind PoC placebo controlled trial1

  • Endpoint binary ASAS20 response at week 6
    Improvement = higher rates

  • Historical control data
    8 studies with 513 patients

  • Bayesian design

    • P(\pi_t - \pi_p > 0 | y) > 0.95
    • Placebo: (“MAP prior” from historical data)
      • \pi_{p} \sim \text{Beta}(11,32)
    • Active:
      • \pi_{t} \sim \text{Beta}(0.5,1)
  • Randomization ratio 4:1 (24 vs 6)

MAP prior:

mean sd 2.5% 50% 97.5%
0.26 0.09 0.11 0.25 0.47

Generalized Meta-Analytic-Predictive model

Y is the (control) group summary data for H historical trials \begin{align*} Y_{h}|\theta_{h} &\sim f(\theta_{h}) & \forall \; h \in [1,H] \\ Y_\ast|\theta_\ast &\sim f(\theta_\ast) & \text{for new trial} \end{align*}

Exchangeability assumption: \begin{align*} g(\theta_{h})|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \forall \; h \in [1,H] \\ g(\theta_\ast)|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \text{for new trial} \end{align*}

  • f likelihood / g link function: Binomial/logit, Normal (known \sigma)/identity or Poisson/log
  • \tau \rightarrow 0 \Rightarrow pooling / unbound use of historical data
  • \tau \rightarrow \infty \Rightarrow stratification / no use of historical data

Running the MAP analysis in R

  • RBesT1 ::: {.cell hash=‘bamdd_graz2025_cache/revealjs/unnamed-chunk-19_d616e04ad0bfc75a9afa9cb7ddadb5e2’}
library(RBesT)
set.seed(98721487)
map_mc <- gMAP(cbind(r, n-r) ~ 1 | study,
               data=RBesT::AS,
               family=binomial,
               tau.dist="HalfNormal",
               tau.prior=1,
               beta.prior=2)
  • brms ::: {.cell hash=‘bamdd_graz2025_cache/revealjs/unnamed-chunk-20_a14374a23758824ec54b3197d4bf2bdc’}
library(brms)
# outcome modifier "| trials(n)" to specify number of trials
# intercept study random effect requested with (1 | study)
bmap_model <- bf(r | trials(n) ~ 1 + (1 | study),
                 family=binomial, center=FALSE)
bmap_model_prior <- prior(normal(0, 2), class=b, coef=Intercept) +
    prior(normal(0, 1), class=sd, coef=Intercept, group=study)
bmap_mc <- brm(bmap_model,
               data=RBesT::AS,
               prior=bmap_model_prior,
               seed=98721487, control=list(adapt_delta=0.99))

::::

study n r
Study 1 107 23
Study 2 44 12
Study 3 51 19
Study 4 39 9
Study 5 139 39
Study 6 20 6
Study 7 78 9
Study 8 35 10

:::::

brms generated Stan code & data

stancode(bmap_mc)
// generated with brms 2.21.0
functions {
}
data {
  int<lower=1> N;  // total number of observations
  array[N] int Y;  // response variable
  array[N] int trials;  // number of trials
  int<lower=1> K;  // number of population-level effects
  matrix[N, K] X;  // population-level design matrix
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  array[N] int<lower=1> J_1;  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_1;
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
}
parameters {
  vector[K] b;  // regression coefficients
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  array[M_1] vector[N_1] z_1;  // standardized group-level effects
}
transformed parameters {
  vector[N_1] r_1_1;  // actual group-level effects
  real lprior = 0;  // prior contributions to the log posterior
  r_1_1 = (sd_1[1] * (z_1[1]));
  lprior += normal_lpdf(b[1] | 0, 2);
  lprior += normal_lpdf(sd_1[1] | 0, 1)
    - 1 * normal_lccdf(0 | 0, 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] mu = rep_vector(0.0, N);
    mu += X * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
    }
    target += binomial_logit_lpmf(Y | trials, mu);
  }
  // priors including constants
  target += lprior;
  target += std_normal_lpdf(z_1[1]);
}
generated quantities {
}
standata(bmap_mc)
$N
[1] 8

$Y
[1] 23 12 19  9 39  6  9 10

$trials
[1] 107  44  51  39 139  20  78  35

$K
[1] 1

$X
  Intercept
1         1
2         1
3         1
4         1
5         1
6         1
7         1
8         1
attr(,"assign")
[1] 0

$Z_1_1
1 2 3 4 5 6 7 8 
1 1 1 1 1 1 1 1 

$J_1
[1] 1 2 3 4 5 6 7 8

$N_1
[1] 8

$M_1
[1] 1

$NC_1
[1] 0

$prior_only
[1] 0

attr(,"class")
[1] "standata" "list"    

Summary MAP prior model

summary(bmap_mc)
 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | study) 
   Data: RBesT::AS (Number of observations: 8) 

Multilevel Hyperparameters:
~study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.37      0.21     0.04     0.86 1.00     1104     1126

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.11      0.19    -1.48    -0.73 1.00     1581     1753
  • Intercept is the typical trial response rate on the logit scale
  • sd(Intercept) is the between-trial heterogeneity \tau

Predicting the placebo response rate in a new study

AS_new <- data.frame(study = "new_study", n = 1)
pe <- posterior_epred(
    bmap_mc, newdata = AS_new,
    allow_new_levels = TRUE, 
    sample_new_levels = "gaussian"
)
posterior_summary(pe)
      Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.2566723 0.08531282 0.1139468 0.4631436
  • allow_new_levels=TRUE allows to simulate the study random effect for new (unseen) studies
  • sample_new_levels = "gaussian" ensures that the simulation is done using the normal hierarchical model (the default is to bootstrap existing levels)

Approximation with a finite mixture

For specifying the prior as part of a pre-specified analysis we choose a parametric approximation:

pe_mix <- RBesT::automixfit(pe[, 1], type = "beta")
print(pe_mix, digits=3)
EM for Beta Mixture Model
Log-Likelihood = 4543.712

Univariate beta mixture
Mixture Components:
  comp1   comp2   comp3   comp4  
w   0.460   0.210   0.199   0.130
a  35.378  23.503  12.515   2.595
b 108.674  50.521  58.228   5.490
plot(pe_mix)$mix

Region specific MAP prior (hypothetically)

withr::with_seed(654873,
                 AS_region <- bind_cols(AS, region=sample(c("asia", "europe", "north_america"), 8, TRUE)))
kable(AS_region)
study n r region
Study 1 107 23 europe
Study 2 44 12 asia
Study 3 51 19 europe
Study 4 39 9 europe
Study 5 139 39 asia
Study 6 20 6 north_america
Study 7 78 9 europe
Study 8 35 10 asia

Region specific MAP Priors via varying regions model

form_AS_region <- bf(r | trials(n) ~ 1 + (1 | region/study), 
                     family = binomial, center=FALSE)

# show which priors brms defines for this model
get_prior(form_AS_region, AS_region)

bprior_AS_region <- prior(normal(0, 2), class=b, coef=Intercept) +
  prior(normal(0, 0.8), class=sd, coef=Intercept, group=region) +
  prior(normal(0, 0.6), class=sd, coef=Intercept, group=region:study)

fit_AS_region <- brm(
  form_AS_region, data = AS_region, prior = bprior_AS_region, seed = 29856341,
  control=list(adapt_delta=0.99))
  • (1 | region/study) specifies a nested random effect as each study is run in a single region. It is equivalent to
    (1 | region) + (1 | region:study).

Summary region model

summary(fit_AS_region)
 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | region/study) 
   Data: AS_region (Number of observations: 8) 

Multilevel Hyperparameters:
~region (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.35      0.30     0.01     1.13 1.00     1525     1633

~region:study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.34      0.19     0.02     0.76 1.00     1184     1173

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.07      0.31    -1.68    -0.41 1.00     1614     1509

Extract MAP MCMC samples

AS_region_new <- data.frame(study = "new_study_asia", 
                            n = 1, region = "asia")

pe_region <- posterior_epred(
  fit_AS_region, newdata = AS_region_new, 
  allow_new_levels = TRUE, 
  sample_new_levels = "gaussian"
)
dim(pe_region)
[1] 4000    1
posterior_summary(pe_region)
      Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.2744138 0.08775661 0.1241564 0.4937323

Obtain parametric MAP prior

pe_mix_region <- RBesT::automixfit(pe_region[, 1], type = "beta")
plot(pe_mix_region)$mix

Summary historical control data

  • Bayesian random-effects meta-analysis models can be used to derive Meta-Analytic-Predictive (MAP) priors
  • Predictions for new study mean inform the MAP prior
  • Specification and inference for MAP models is simple in brms
  • Not covered here:

Hands-on exercises: historical control data

Priors

The prior can only be understood in the context of the model 1

Commonly heard reasons to be afraid of priors

“Priors are inherently subjective”

“Priors bias your analysis”

“You should let the data speak for themselves”

“Non-Bayesian models don’t have priors and don’t need them”

“I have no idea how to set appropriate priors”

Are priors inherently subjective?

Consider using the alternative terms “contextual” or “conditional”1

  • on the person or people or agent performing the analysis
  • on results of previous studies
  • on structural assumptions

Do priors bias your analysis?

Definition:

An estimator \hat{\theta} of the parameter \theta is called unbiased if \text{E}_{y}(\hat{\theta}) = \theta.

Corrollary:

If an estimator \hat{\theta} is unbiased for a given prior, it will be biased for most other priors.

Corrollary:

If an estimator \hat{\theta} is unbiased at a given scale, it will be biased for most non-linear transformations f(\hat{\theta}) (Jensen’s inequality): \text{E}_y(f(\hat{\theta})) \neq f(\theta)

Unbiasedness is fragile and often only holds in the limit

Is bias even the right metric?

A sampling distribution is a distribution of a data-dependent variable (a “statistic”) as implied by variation in data y across the true data generating process

Consider two estimators \hat{\theta}_1 and \hat{\theta}_2 of \theta with sampling distributions given by

  • p(\hat{\theta}_1) := p(\hat{\theta}_1(y)) = \text{Normal}(\theta, 10)
  • p(\hat{\theta}_2) := p(\hat{\theta}_2(y)) = \text{Normal}(\theta + 0.2, 1)

Which of them is better?

Winning the bias-variance trade-off

The RMSE can be defined as \begin{align*} \text{RMSE}_y(\hat{\theta}, \theta) & = \sqrt{\int (\hat{\theta}(y) - \theta)^2 \; p(y) \; d y} \\ & = \sqrt{\text{Var}_y{\hat{\theta}} + \text{Bias}_y(\hat{\theta}, \theta) ^2} \end{align*}

It expresses a trade-off between variance and bias

A lot of priors will help you win the bias-variance trade-off, not the race for minimal bias!

Non-Bayesian models don’t have priors?

Basic multilevel model with J groups:

y ~ normal(b0[j] + b1 * x, sigma)
b0[j] ~ normal(b0, sd0)

Is b0[j] ~ normal(b0, sd0) a likelihood, prior, or something else?

“Non-informative” uniformity is informative!

Principles of specifying priors

  • Respecting boundaries: use hard bounds in priors exactly where parameters have natural bounds

  • Expressiveness: use prior families flexible enough to express different plausible prior knowledge

  • Controlling complexity: If you want to be conservative, put relevant amount of prior probability at or near a conservative submodel

  • Scale awareness: ensure that priors take the scales and measurement unit of parameters into account

  • Reflecting structure: reflect the structure of the data in the structure of the prior

  • Data Informed: Use previous data to inform the current priors

Data informed priors

Evaluating and comparing priors

  • Consider the unit of a parameter

  • Visualize the prior

  • Categorization of parameter into
    ranges

  • Comparisons of priors through
    interval probabilities

Example: Binary response rates for a non-rare or frequent event

Category Percentage range
Extreme 0–1% & 99–100%
Very Small 1–5% & 95–99%
Small 5–10% & 90–95%
Moderate 10–20% & 80–90%

Prior visualization of a log-odds intercept

Example default priors

wide
Normal(0, 10^2)
brms
Student-t(3, 0, 2.5^2)
RBesT
Normal(0, 2^2)

Prior intervals of a log-odds intercept

Category Percentage range
Extreme 0–1% & 99–100%
Very Small 1–5% & 95–99%
Small 5–10% & 90–95%
Moderate 10–20% & 80–90%

Advanced: Prior tails

What is the posterior?

  • … if the prior is \theta \sim \text{student-t}(4, 5, 1)

  • … and the likelihood alone implies a posterior of \theta \sim \text{normal}(-5, 1)

Student-t(4) prior vs. normal likelihood

Normal prior vs. student-t(4) likelihood

Normal prior vs. normal likelihood

Student-t(4) prior vs. student-t(4) likelihood

Dose finding

Typical dose response shapes

Typical dose response shapes

Typical dose response shapes

Data Set PATHWAY

  • Placebo controlled trial
  • Treatment of severe asthma with tezepelumab
  • Three different doses + placebo
  • Endpoint: annualized rate of asthma exacerbations
  • Estimates per arm from negative binomial regression (like in “arm-based meta-analysis”), not individual patient data
dose group log_est log_stderr
0 placebo −0.400 0.103
70 tezepelumab 70 mg q4w −1.347 0.177
210 tezepelumab 210 mg q4w −1.661 0.222
560 tezepelumab 280 mg q2w −1.514 0.191

Sigmoid Emax Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \frac{\text{dose}^h}{\text{dose}^h + \text{ED}_{50}^h}

Parameters:

  • \text{E}_0 \in \mathbb{R}: Expected placebo outcome
  • \text{E}_\text{max} \in \mathbb{R}: Maximum effect size
  • h \in \mathbb{R}_+: Hill (steepness) parameter
  • \text{ED}_{50}: Dose at which 50% of \text{E}_\text{max} is reached

Sigmoid Emax Model: Visualization

Specifying sigmoid Emax Model with brms

form_sig <- bf( 
  log_est | se(log_stderr) ~ E0 + Emax * dose^h / 
                             (dose^h + ED50^h),
  nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
  E0 ~ 1, Emax ~ 1, logh ~ 1, logED50 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_sig <- prior(normal(0,1), nlpar=E0) +
  prior(normal(0,1), nlpar=logh) +
  prior(normal(0,1), nlpar=Emax) +
  prior(normal(4,2), nlpar=logED50)

A note on Emax Model priors

What is more informative?

  • \text{logh} \sim \text{normal}(0,1)
  • \text{logh} \sim \text{normal}(0,3)

A note on Emax Model priors

A note on Emax Model priors

  • Biologically plausible values: approx. .25 < h < 4 (see prior interval probabilities).
    The above prior puts roughly 1/3 above 4 and 1/3 below 0.25!

  • Re-writing the fraction to \left(\frac{d}{\text{ED}_{50}}\right)^h / \left(1+\left(\frac{d}{\text{ED}_{50}}\right)^h\right) makes it obvious that for h \rightarrow 0, this goes to 1/2, and for h \rightarrow \infty, either to 0 (d < \text{ED}_{50}) or 1 (d > \text{ED}_{50}).

Fitting the sigmoid Emax Model with brms

fit_sig <- brm(
  formula = form_sig, 
  data = pathway, 
  prior = prior_sig,
  control = list(adapt_delta = 0.999)
) 

Sigmoid Emax Model: Results Summary

summary(fit_sig)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_est | se(log_stderr) ~ E0 + Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         E0 ~ 1
         Emax ~ 1
         logh ~ 1
         logED50 ~ 1
   Data: pathway (Number of observations: 4) 

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
E0_Intercept         -0.42      0.10    -0.61    -0.21 1.00     2065     2179
Emax_Intercept       -1.30      0.32    -2.11    -0.84 1.00     1172     1199
logh_Intercept       -0.08      0.98    -1.90     1.92 1.00     1306     1914
logED50_Intercept     2.73      1.38    -0.27     5.32 1.00     1341     1273

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00   NA       NA       NA

Visualizing the Fitted Sigmoid Emax Model

tibble(dose = seq(0, 560, length.out=51), log_stderr=1) |>
  tidybayes::add_epred_rvars(object=fit_sig) |>
  mutate(.delta = .epred - .epred[dose == 0]) |>
  ggplot(aes(x=dose, ydist=.delta)) +
  ggdist::stat_lineribbon()

Modified Beta Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \; \frac{(\delta_1 + \delta_2)^{(\delta_1+\delta_2)}} {\delta_1^{\delta_1} \, \delta_2^{\delta_2}} \; \left(\frac{\text{dose}}{S}\right)^{\delta_1} \left(1-\frac{\text{dose}}{S}\right)^{\delta_2}

Parameters:

  • \text{E}_0 \in \mathbb{R}: Expected placebo response
  • \text{E}_\text{max} \in \mathbb{R}: Maximum effect size
  • \delta_1, \delta_2 \in \mathbb{R}_+: Shape parameters
  • S: constant > maximum dose, e.g. 1.5 \times \max(\text{dose}), here we choose S=850

Specifying the Modified Beta Model with brms

form_mbeta <- bf( 
  log_est | se(log_stderr) ~ E0 +   
    Emax * (delta1+delta2)^(delta1+delta2) /
    (delta1^delta1 * delta2^delta2) * 
    (dose/850)^delta1 * (1-dose/850)^delta2,
  nlf(delta1 ~ exp(logdelta1)), nlf(delta2 ~ exp(logdelta2)),
  E0 ~ 1, Emax ~ 1, logdelta1 ~ 1, logdelta2 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_mbeta <- prior(normal(0,1), nlpar="E0") +
  prior(normal(0,1), nlpar="Emax") +
  prior(normal(0,1), nlpar="logdelta1") +
  prior(normal(0,1), nlpar="logdelta2")

Fitting the Modified Beta Model with brms

fit_mbeta <- brm(
  form_mbeta, 
  data = pathway, 
  prior = prior_mbeta,
  control = list(adapt_delta = 0.999)
)

Visualizing the Fitted Modified Beta Model

Model Evaluation (failed attempt)

(loo_mbeta <- loo(fit_mbeta))

Computed from 4000 by 4 log-likelihood matrix.

         Estimate  SE
elpd_loo      0.1 0.4
p_loo         2.2 0.6
looic        -0.2 0.7
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.6]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     1     25.0%   1282    
   (0.7, 1]   (bad)      3     75.0%   <NA>    
   (1, Inf)   (very bad) 0      0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
  • brms interfaces with the loo R package that performs efficient approximate leave-one-out cross-validation (LOO) for Bayesian models by Pareto smoothed importance sampling (PSIS). See Vethari et al. (2017) and the loo documentation on CRAN.

Model Evaluation (works)

loo_exact_sig <- kfold(fit_sig, folds = "loo", silent=2)
(loo_exact_mbeta <- kfold(fit_mbeta, folds = "loo", silent=2)) 

Based on 4-fold cross-validation.

           Estimate  SE
elpd_kfold     -2.2 1.8
p_kfold         4.5 2.3
kfoldic         4.4 3.6
  • elpd_kfold: CV estimate of the expected log pointwise predictive density for a new dataset
  • p_kfold: CV estimate of difference between elpd_loo and the non-cross-validated log posterior predictive density (describes how much more difficult it is to predict future data than the observed data).
  • kfoldic: -2 * elpd_kfold (on the conventional scale of deviance)

Model Comparison

loo_compare(loo_exact_sig, loo_exact_mbeta) 
          elpd_diff se_diff
fit_sig    0.0       0.0   
fit_mbeta -1.9       0.9   
  • loo_compare computes a paired estimate of the difference in ELPD to take advantage of the fact that the same set of data points was used to fit both models (and reduce the standard error). See Vethari et al. (2017) and the loo R package manual for details.
fit_sig$criteria$loo <- loo_exact_sig
fit_mbeta$criteria$loo <- loo_exact_mbeta
(w_dose <- model_weights(fit_sig, fit_mbeta, weights = "loo")) 
  fit_sig fit_mbeta 
0.8671182 0.1328818 
  • Here, model_weights from brms will compute Akaike weights based on the information criterion values returned by loo

Bayesian Model Averaging

pe_sig <- posterior_epred(fit_sig, newdata = dose_df)
pe_mbeta <- posterior_epred(fit_mbeta, newdata = dose_df)
pe_avg <- pe_sig * w_dose[1] + pe_mbeta * w_dose[2]  
pe_avg <- pe_avg |>
  posterior_summary() |>
  as.data.frame() |> 
  bind_cols(dose_df) 
    Estimate  Est.Error       Q2.5      Q97.5      dose
1 -0.4164914 0.08891933 -0.5853796 -0.2396410  0.000000
2 -0.8327392 0.26642276 -1.3421644 -0.3906439  5.656566
3 -0.9587280 0.26445532 -1.4106496 -0.4461609 11.313131
4 -1.0438294 0.25120451 -1.4518520 -0.4946240 16.969697

Visualizing the Model Averaging

Hands-on exercises: dose finding

Bayesian Mixed Models for Repeated Measures

Hypothetical dose finding study, N=200

Continuous outcome measured at baseline & weeks 2, 4, 8, 12

Overview and Analysis Goals

  • Mixed Effects Model for Repeated Measures (MMRM) commonly used for longitudinal data (each patient measured at multiple visits)

  • Direct likelihood analysis that can address hypothetical estimand of all patients completing the study on treatment without doing missing data imputation first

  • Commonly no structure assumed for correlations between visits and different variance allowed for different visits (to avoid unnecessary assumptions)

  • Convergence issues with REML fit common, especially for small sample sizes, which is alleviated by Bayesian inference with (weakly-)informative priors

  • Bayesian approach allows us to incorporate prior information and historical data, which is very interesting for Phase I studies

  • brms lets us easily add more components & structure to the model

What do our data look like?

Analysis Data Model (ADaM) Basic Data Structure (BDS)

USUBJID TRT01P AVISIT ADY AVAL CHG BASE
3 10 visit1 14 1.35 0.58 0.78
3 10 visit2 28 1.17 0.39 0.78
3 10 visit3 56 −0.38 −1.15 0.78
3 10 visit4 84 1.34 0.56 0.78
9 10 visit1 14 0.99 0.88 0.11
9 10 visit2 28 1.33 1.22 0.11
9 10 visit3 56 −1.11 −1.21 0.11
9 10 visit4 84 1.14 1.04 0.11
13 40 visit1 14 2.40 0.50 1.90
13 40 visit2 28 1.92 0.02 1.90

Patients can miss visits or drop-out of the study prior to the final visit.

Model Specification: Formal

Formally, let us assume that there are V visits. We assume that the V-dimensional response \boldsymbol{Y}_i for patient i satisfies \boldsymbol{Y}_i = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{Z}_i \boldsymbol{b}_i + \boldsymbol{\epsilon}_i

with \boldsymbol{b}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{D}) and \boldsymbol{\epsilon}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{\Sigma}), where \boldsymbol{\Sigma} is a diagonal matrix. This implies \boldsymbol{Y}_i \sim \text{MVN}(\boldsymbol{X}_i \boldsymbol{\beta}, \boldsymbol{V}_i), where \boldsymbol{V}_i = \boldsymbol{Z}_i \boldsymbol{D} \boldsymbol{Z}_i^T + \boldsymbol{\Sigma}. Model the correlated Y_{ij} either by

  1. marginalizing out random effects & account for them with correlated residual errors (residual covariance matrix V_i), or
  2. conditionally on (V-dimensional) random effects \boldsymbol{b}_i with residual errors \boldsymbol{\epsilon}_i independent (once we condition on \boldsymbol{b}_i).

MMRMs in SAS

Widely-used high-quality reference implementation

PROC MIXED DATA=simulated_data;
  CLASS TRT01P AVISIT USUBJID;
  MODEL CHG ~ TRT01P AVISIT BASE TRT01P*AVISIT AVISIT*BASE 
    / SOLUTION DDFM=KR ALPHA = 0.05;
  REPEATED AVISIT / TYPE=UN SUBJECT = USUBJID R Rcorr GROUP=TRT01P;
  LSMEANS TRT01P*AVISIT / DIFFS PDIFF CL OM E;
RUN;

MMRMs with the mmrm R package

Fit the model in R in a frequentist framework

library(mmrm)
mmrm_fit <- mmrm(
  formula = CHG ~ TRT01P + AVISIT + BASE + AVISIT:TRT01P + 
    AVISIT:BASE + us(AVISIT | TRT01P / USUBJID),
  method = "Kenward-Roger", 
  vcov = "Kenward-Roger-Linear", # to match SAS
  data = mutate(simulated_data, USUBJID=factor(USUBJID))
)

Refer to https://openpharma.github.io/mmrm.

Motivation Forward Difference Parametrization

We would like to put priors on the differences from visit to visit

Difference contrasts in R

Setup forward difference contrasts for changes between visits:

contrasts(simulated_data$AVISIT) <- MASS::contr.sdif

Hard to interpret the contrast matrix directly:

contrasts(simulated_data$AVISIT) |> MASS::fractions()
       visit2-visit1 visit3-visit2 visit4-visit3
visit1 -3/4          -1/2          -1/4         
visit2  1/4          -1/2          -1/4         
visit3  1/4           1/2          -1/4         
visit4  1/4           1/2           3/4         

Learn more about contrasts: https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

What do these contrasts mean?

Inverting the contrast matrix reveals the dummy variables’ interpretation:

# add the intercept
cmat <- cbind("Intercept" = 1, contrasts(simulated_data$AVISIT))
# compute the inverse matrix
solve(cmat) |> MASS::fractions()
              visit1 visit2 visit3 visit4
Intercept     1/4    1/4    1/4    1/4   
visit2-visit1  -1      1      0      0   
visit3-visit2   0     -1      1      0   
visit4-visit3   0      0     -1      1   

MMRMs in brms

mmrm_model1 <- bf(
  CHG ~ 1 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT 
    + unstr(time = AVISIT, gr = USUBJID),
  sigma ~ 1 + AVISIT + TRT01P + AVISIT:TRT01P
)
mmrm_prior1 <- prior(normal(0, 2), class=Intercept) +
    prior(normal(0, 1), class=b) +
    prior(normal(0, log(10.0)/1.64), class=Intercept, dpar=sigma) +
    prior(normal(0, log(2.0)/1.64), class=b, dpar=sigma) +  
    prior(lkj(1), class=cortime)
fit_mmrm1 <- brm(
  formula = mmrm_model1, data = simulated_data, prior = mmrm_prior1, ...
)

Note that a single corrleation matrix is estimated while the covariance varies as per the sigma model.

Expected marginal (least-squares) means

emm2 <- emmeans(fit_mmrm1, ~ TRT01P | AVISIT, weights="proportional")
AVISIT = visit1:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.3231   -0.5000   -0.1290
 10      0.0907   -0.0949    0.2756
 20      0.0630   -0.1277    0.2434
 40     -0.0866   -0.3335    0.1498

AVISIT = visit4:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.2036   -0.4601    0.0686
 10      0.2220   -0.0281    0.4510
 20      0.1079   -0.1345    0.3562
 40     -0.0430   -0.4115    0.2811

Point estimate displayed: median 
HPD interval probability: 0.95 

To work with MCMC samples of the expected marginal means & to summarize them exactly as we want (e.g. quantile credible intervals instead of highest posterior density intervals) we can use:

emm2 |> as.mcmc() |> summarize_draws()

Expected Marginal Contrasts per Visit

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0")
AVISIT = visit1:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0    0.410    0.1491     0.689
 TRT01P20 - TRT01P0    0.381    0.1370     0.654
 TRT01P40 - TRT01P0    0.237   -0.0779     0.536

AVISIT = visit4:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0    0.430    0.0657     0.777
 TRT01P20 - TRT01P0    0.311   -0.0637     0.661
 TRT01P40 - TRT01P0    0.161   -0.2617     0.594

Point estimate displayed: median 
HPD interval probability: 0.95 

as.mcmc() to work with MCMC samples of the difference, e.g.

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0") |> as.mcmc()

Monotonic Effects Across Ordered Factor Levels

mmrm_model2 <- bf( 
  CHG ~ 1 + AVISIT + mo(TRT01P) + BASE + mo(TRT01P):AVISIT 
    + BASE:AVISIT + unstr(time = AVISIT, gr = USUBJID),
  sigma ~1 + AVISIT + mo(TRT01P) + mo(TRT01P):AVISIT
)

For category c=0,\ldots,(\text{categories}-1) = C-1, the monotonic term is \text{coefficient} \times (C-1) \times \sum_{k=1}^{c} \zeta_k

  • \zeta_k \in [0, 1] is the percent increase and is subject to the constraint \sum_{k=1}^{C-1} \zeta_k = 1

  • \text{coefficient} is the mean change between adjacent categories

For more details see the respective vignette:
https://paulbuerkner.com/brms/articles/brms_monotonic.html

fit_mmrm2 <- brm(formula = mmrm_model2,
                 data = simulated_data |> mutate(TRT01P=ordered(TRT01P)), prior = mmrm_prior1, ...)

Results from different MMRMs

MMRMs: Outlook

In the online case study on http://opensource.nibr.com/bamdd you additionally find:

  • Data and full code
  • More on estimands, parametrization, contrasts & priors
  • Estimating average differences across visits
  • Meta-analytic combined (MAC) using historical data
  • Robustifying MAC via a “slab-and-spike”-type prior
  • Non-linear functions over time & doses in MMRMs
  • Excercises

Longitudinal data

Longitudinal Data: Background

  • Example: simulated study of a drug for the treatment of Psoriasis
  • 112 patients, 59 randomized to placebo, 53 to drug.
  • Efficacy was assessed using the Psoriasis Area and Severity Index (PASI), a numerical score which measures the severity and extent of psoriasis (0: no disease, 72: maximum).
  • Assessed at baseline and 7 post-baseline timepoints.

Longitudinal Data: Illustration

Longitudinal Data: Key Variables

# A tibble: 10 × 6
      CHG  BASE TRT01P AVISIT  AVISITN SUBJID
    <dbl> <dbl> <fct>  <fct>     <dbl>  <dbl>
 1 -2.2    16   PBO    Week 1        1      1
 2 -0.5    16   PBO    Week 2        2      1
 3  4.7    16   PBO    Week 4        4      1
 4  7.3    16   PBO    Week 6        6      1
 5 32.1    16   PBO    Week 8        8      1
 6 10      16   PBO    Week 10      10      1
 7 20.7    16   PBO    Week 12      12      1
 8  3.4    20.3 TRT    Week 1        1      2
 9  1.2    20.3 TRT    Week 2        2      2
10  0.600  20.3 TRT    Week 4        4      2

Longitudinal Data: Mean Model

fit_long_mean <- brm(
  CHG ~ BASE + TRT01P * AVISIT + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_mean <- add_criterion(fit_long_mean, "loo")

Total elapsed fitting time (seconds):

sum(rstan::get_elapsed_time(fit_long_mean$fit))
[1] 5.349

Mean Model: Posterior predictive checks

pp_check(fit_long_mean)
  • pp_check in brms performs posterior predictive checks with the help of the bayesplot package (shows dens_overlay by default).
    • Can also be called with type loo_pit_qq (probability integral transformation). In an ideal model fit, the resulting PIT-transformed values would be uniformly distributed: i.e. the blue points and dashed lines would align with the distribution function of a Uniform(0,1) variable (solid line). See chapter 12 of BAMDD for details.

Mean Model: LOO-CV results

loo(fit_long_mean)

Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2285.9 33.1
p_loo       102.1  9.4
looic      4571.8 66.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 2.0]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     682   99.7%   138     
   (0.7, 1]   (bad)        2    0.3%   <NA>    
   (1, Inf)   (very bad)   0    0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
  • While this is quick, is this what we are interested in?
    • If we’re interested in the prediction for a new (unseen) patient, we should be running leave-one-patient-out CV (not leave-one-observation-out)…
      See kfold_split_grouped in the loo package.
    • For the Emax model data: not possible (IPD needed!)

Mean Model: Visualization of effects

conditional_effects(fit_long_mean, "AVISIT:TRT01P")
  • By default, numeric variables will be conditionalized by using their means and factors will get their first level assigned.
  • We might be interested in the effect integrated out over a population \rightarrow emmeans

Longitudinal Data: Linear Model

fit_long_lin <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_lin <- add_criterion(fit_long_lin, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin$fit))
[1] 4.785

Linear Model: Summary

summary(fit_long_lin)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     5.20      0.46     4.39     6.18 1.00     1298     2441

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            13.13      1.77     9.72    16.77 1.00     1419     2234
BASE                 -0.70      0.08    -0.86    -0.55 1.00     1214     2196
TRT01PTRT            -6.02      1.37    -8.61    -3.37 1.00     1412     1881
AVISITN              -0.22      0.09    -0.40    -0.05 1.00     3248     3431
TRT01PTRT:AVISITN    -0.72      0.13    -0.97    -0.46 1.00     2906     2697

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     6.52      0.19     6.15     6.91 1.00     3871     2875

Linear Model: Visualization of effects

conditional_effects(fit_long_lin, "AVISITN:TRT01P")

Longitudinal Data: Linear Model with Varying Slopes

fit_long_lin_vs <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID),
  data = pasi_data, seed = 2454
)
fit_long_lin_vs <- add_criterion(fit_long_lin_vs, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin_vs$fit))
[1] 14.752

Linear Model: Summary

summary(fit_long_lin_vs)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)              5.25      0.55     4.24     6.41 1.00     1535     2053
sd(AVISITN)                1.10      0.09     0.94     1.31 1.01      680     1056
cor(Intercept,AVISITN)    -0.54      0.09    -0.70    -0.35 1.01      504     1035

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             9.28      1.69     6.11    12.67 1.00     1354     2501
BASE                 -0.50      0.08    -0.65    -0.35 1.00     1142     2018
TRT01PTRT            -6.46      1.23    -8.85    -4.02 1.00     1834     2717
AVISITN              -0.24      0.16    -0.56     0.09 1.00      937     1525
TRT01PTRT:AVISITN    -0.70      0.24    -1.18    -0.24 1.00      826     1356

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.67      0.15     4.39     4.97 1.00     3732     3319

Linear Model with Varying Slopes: Visualization of effects

conditional_effects(fit_long_lin_vs, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin, fit_long_lin_vs)
                elpd_diff se_diff
fit_long_lin_vs    0.0       0.0 
fit_long_lin    -173.8      20.2 

Longitudinal Data: Monotonic Model

fit_long_mo <- brm(
  CHG ~ BASE + TRT01P * mo(AVISITN) + 
    (1 + mo(AVISITN) | SUBJID),
  data = pasi_data, seed = 24543
)
fit_long_mo <- add_criterion(fit_long_mo, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_mo$fit))
[1] 46.254

Monotonic Model: Visualization of effects

conditional_effects(fit_long_mo, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin_vs, fit_long_mo)
                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_lin_vs -93.1      11.7  

Longitudinal Data: Quadratic Model

fit_long_quad <- brm(
  CHG ~ BASE + TRT01P * (AVISITN + I(AVISITN^2)) + 
    (1 + AVISITN + I(AVISITN^2) | SUBJID),
  data = pasi_data, seed = 245124
)
fit_long_quad <- add_criterion(fit_long_quad, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad$fit))
[1] 69.969

Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad, "AVISITN:TRT01P")

Longitudinal Data: Centering the time variable

Non-centered variables are often highly correlated with their square

cor(pasi_data$AVISITN, pasi_data$AVISITN^2)
[1] 0.9734303

Centering can be done around any central value. Here we use 6 because this is the average week in the study

pasi_data <- pasi_data |> 
  mutate(AVISITN_c = AVISITN - 6)

After centering, the correlation is drastically reduced

cor(pasi_data$AVISITN_c, pasi_data$AVISITN_c^2)
[1] 0.2548532

Longitudinal Data: Centered Quadratic model

fit_long_quad_c <- brm(
  CHG ~ BASE + TRT01P * (AVISITN_c + I(AVISITN_c^2)) + 
    (1 + AVISITN_c + I(AVISITN_c^2) | SUBJID),
  data = pasi_data, seed = 24547
)
fit_long_quad_c <- add_criterion(fit_long_quad_c, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad_c$fit))
[1] 24.492

Centered Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad_c, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_mo, fit_long_quad_c)
                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_quad_c -18.5      10.0  

Longitudinal Data: Gaussian Process Model

fit_long_gp <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 32454,
  control = list(adapt_delta = 0.95)
)
fit_long_gp <- add_criterion(fit_long_gp, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp$fit))
[1] 148.781

Gaussian Process Model: LOO-CV results

loo(fit_long_gp)

Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2042.8 35.6
p_loo       191.8 15.8
looic      4085.6 71.1
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.7]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     644   94.2%   95      
   (0.7, 1]   (bad)       35    5.1%   <NA>    
   (1, Inf)   (very bad)   5    0.7%   <NA>    
See help('pareto-k-diagnostic') for details.

Gaussian Process Model: Visualization of effects

conditional_effects(fit_long_gp, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_quad_c, fit_long_gp)
                elpd_diff se_diff
fit_long_gp       0.0       0.0  
fit_long_quad_c -13.8       6.1  

Recall that model comparisons may not be trustworthy.

Longitudinal Data: Gaussian Process AR1 Model

fit_long_gp_ar <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    ar(AVISITN_c, gr = SUBJID, p = 1) +
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 24235,
  iter = 4000, warmup = 1000,
  control = list(adapt_delta = 0.95)
)
fit_long_gp_ar <- add_criterion(fit_long_gp_ar, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp_ar$fit))
[1] 349.017

Gaussian Process AR1 Model: Autocorrelation results

Some autocorrelation may be present:

posterior_summary(fit_long_gp_ar, variable = "ar") |>
  round(3)
      Estimate Est.Error Q2.5 Q97.5
ar[1]    0.374     0.119 0.15 0.612

A more formal model comparison may be theoretically advantageous but given that we add only a single parameter (the AR1 effect), we should probably gain more than we loose by including it.

Gaussian Process AR1 Model: Visualization of effects

conditional_effects(fit_long_gp_ar, "AVISITN_c:TRT01P")

Time-to-Event analysis

Overview and Analysis Goals

  • Oncology late phase trial to evaluate efficacy of an active drug given in addition to two similar standard of care (SoC-A and SoC-B), which vary geographically

  • A total of 4 trial arms studied combining

    • SoC-A & active vs SoC-A
    • SoC-B & active vs SoC-B
  • Analysis needs to account for:

    • SoC-A and SoC-B are known for similar efficacy
    • Active drug efficacy is expected to be consistent with SoC-A and SoC-B
      \Rightarrow interest in average treatment effect

Key analysis goal: Reflect prior knowledge on similarity and increase efficiency in estimating average treatment effect \Rightarrow control model parametrization.

Simulated Data Set

First few rows of the simulated dataset:

y event trt soc arm
1 7.70 0 ctl ChA ctlChA
2 0.10 0 act ChA actChA
3 4.75 0 ctl ChA ctlChA
4 2.75 0 act ChA actChA
5 3.61 1 ctl ChA ctlChA
6 0.94 1 act ChA actChA
7 0.26 1 ctl ChA ctlChA
8 9.21 1 act ChA actChA
9..199
200 11.81 0 act ChA actChA

Contrasts Define Model Parametrization

Overall mean (intercept): \mu = \frac{1}{4}( \mu_{actChA} + \mu_{ctlChA} + \mu_{actChB} + \mu_{ctlChB} )

Average difference between the active and control arms: \delta_{avg.diff} = \frac{1}{2} ( [\mu_{actChA} - \mu_{ctlChA}] + [ \mu_{actChB} - \mu_{ctlChB}] )

Half of the difference in treatment effect between the two SOC: \delta_{effect} = \frac{1}{2}( [ \mu_{actChA} - \mu_{ctlChA}] - [ \mu_{actChB} - \mu_{ctlChB}] )

Difference between the two control arms: \delta_{control} = - \mu_{ctlChA} + \mu_{ctlChB}

Contrasts Defined Via Inverse Mapping

First specify the groups as a function of the contrasts:

cc_inv
              arm
contrast       actChA ctlChA actChB ctlChB
  intercept     1/4    1/4    1/4    1/4  
  effectAvg     1/2   -1/2    1/2   -1/2  
  deltaEffect   1/2   -1/2   -1/2    1/2  
  deltaControl    0     -1      0      1  

Contrasts: Contrast Matrix

Then invert the matrix to get the actual contrast matrix:

cc <- solve(cc_inv)
       intercept effectAvg deltaEffect deltaControl
actChA    1       1/2         1        -1/2        
ctlChA    1      -1/2         0        -1/2        
actChB    1       1/2        -1         1/2        
ctlChB    1      -1/2         0         1/2        

To use the custom contrasts, we have to assign these to the respective factor column of the data (but exclude the overall intercept):

contrasts(sim$arm) <- cc[, -1]

The Weibull Family in brms

When using family weibull in brms, we are modeling the mean time \mu until the event, not the hazard function!

Parameterize as mean time \mu and shape \alpha such that, with scale s = \mu / \Gamma(1 + \frac{1}{\alpha}):

\text{Weibull}(t) = \frac{\alpha}{s} \left( \frac{t}{s} \right)^{\alpha-1} \, \exp\left(-\left(\frac{t}{s}\right)^\alpha\right)

  • This is an accelerated failure time model since the survivor function has the property of S_i(t) = S_{\text{weibull}}(t/\mu_i).
  • When using a log linear model on \mu
    • the intercept is the \log-mean event time
    • regression coefficients are interpretable as speedup/slowdown of the process

See the brms vignette online on families for family parametrization details.

Specify brms Weibull Model

model_weibull1 <- bf(y | cens(1-event) ~ 1 + arm,
                     family=weibull())

prior_weibull1 <- 
  prior(normal(meanInter, log(4)/1.64), class="Intercept") +
  prior(normal(0, sdEffect), coef=armeffectAvg) +
  prior(normal(0, sdDeltaEffect), coef=armdeltaEffect) +
  prior(normal(0, sdDeltaControl), coef=armdeltaControl) +
  prior(gamma(0.1, 0.1), class=shape)

We use variable symbols like meanInter, sdEffect and the like, which need to be defined with stanvars,

stanvars_weibull1 <- 
  stanvar(-log(log(2)/8), name = "meanInter") + 
  stanvar(log(2)/1.64, name = "sdEffect") +
  stanvar(log(1.25)/1.64, name = "sdDeltaEffect") +
  stanvar(log(1.25)/1.64, name = "sdDeltaControl")

This avoids model recompilation when changing priors.

Fit brms Weibull Model

fit_weibull1 <- brm(
  formula = model_weibull1, 
  data = sim, 
  prior = weibull_prior1,
  stanvars = stanvars_weibull1,
  seed=345665
)

Model Summary

summary(fit_weibull1)
 Family: weibull 
  Links: mu = log; shape = identity 
Formula: y | cens(1 - event) ~ 1 + arm 
   Data: sim (Number of observations: 200) 

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           2.16      0.12     1.94     2.43 1.00     3694     2725
armeffectAvg        0.27      0.18    -0.10     0.63 1.00     3912     2771
armdeltaEffect      0.01      0.10    -0.19     0.21 1.00     4388     2938
armdeltaControl     0.06      0.12    -0.16     0.29 1.00     4537     2953

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.97      0.08     0.83     1.13 1.00     3625     2843

Posterior Predictive Checks

pp_check(
  fit_weibull1, type = "km_overlay", 
  status_y = sim$event, ndraws = 100
)

The model predictions assume no censoring

Including Historical Data

Suppose we have historical data available from a previous study where only SoC was applied. This extends our control group data:

y event arm hist1
1 3.64 1 ctlChA 1
2 8.39 1 ctlChA 1
3 6.46 0 ctlChA 1
4 3.18 1 ctlChA 1
5 1.29 1 ctlChA 1
6 5.90 0 ctlChA 1
7 1.99 0 ctlChA 1
8 2.66 1 ctlChA 1
9..399
400 1.03 1 ctlChA 1

Including Historical Data: Analysis Code

  • Approximation of a meta-analytic-approach (MAP) by allowing a deviation for the historical data set on the overall intercept through an indicator covariate
  • Use of a heavy tailed Student-t prior resembles heavy-tailed MAP random effect
sim_comb <- bind_rows(sim, hdata1)
contrasts(sim_comb$arm) <- cc[, -1]

model_weibull2 <- bf(
  y | cens(1-event) ~ 1 + arm + hist1, 
  family=weibull()
)

prior_weibull2 <- prior_weibull1 +
  prior(student_t(6, 0, sdHist), coef = hist1)

stanvars_weibull2 <- stanvars_weibull1 +
  stanvar(log(1.8)/1.64, name = "sdHist")
fit_weibull2 <- brm(
  formula = model_weibull2, 
  data = sim_comb, 
  prior = weibull_prior2,
  stanvars = stanvars_weibull2,
  seed=4567886
)

Summarize the Model

summary(fit_weibull2)
 Family: weibull 
  Links: mu = log; shape = identity 
Formula: y | cens(1 - event) ~ 1 + arm + hist1 
   Data: sim_comb (Number of observations: 600) 

Regression Coefficients:
                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept           2.12      0.10     1.93     2.32 1.00     4683     3290
armeffectAvg        0.29      0.17    -0.06     0.61 1.00     4556     3063
armdeltaEffect      0.00      0.10    -0.20     0.21 1.00     4639     2941
armdeltaControl     0.07      0.12    -0.16     0.30 1.00     4302     3188
hist1              -0.20      0.14    -0.46     0.06 1.00     4252     3091

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.00      0.04     0.92     1.08 1.00     4830     3127

Time-to-Event Modelling: Outlook

In the online case study you additionally find:

  • Additional details and model justification based on a real dataset
  • Include historical data of a trial reporting on the average effect of SoC-A and SoC-B
  • Add custom coded contrasts to further improve flexibility of historical data analysis

Course wrap-up

Summary

  • Diverse opportunities for applied modelling to inform good drug-development decisions
  • Bayesian paradigm is well suited for many of these situations
    • Availability of meaningful prior information
    • Desire for probabilistically interpretable statements about unknowns and future observable quantities
  • brms is a powerful and highly flexible engine for applied modelling , facilitating straightforward model specification and inference

Looking ahead

  • We hope you have:
    • Become familiar with brms syntax and workflow
    • Seen its versatility for statistical modelling in drug development
    • Gained hands-on experience with the package from guided exercises
  • And that you feel empowered to use brms going forward!

Resources

Thank you